Sub-quadratic Markov tree mixture learning based on randomizations of the Chow-Liu algorithm

نویسندگان

  • Sourour Ammar
  • Philippe Leray
  • Louis Wehenkel
چکیده

The present work analyzes different randomized methods to learn Markov tree mixtures for density estimation in very high-dimensional discrete spaces (very large number n of discrete variables) when the sample size (N) is very small compared to n. Several subquadratic relaxations of the Chow-Liu algorithm are proposed, weakening its search procedure. We first study näıve randomizations and then gradually increase the deterministic behavior of the algorithms by trying to focus on the most interesting edges, either by retaining the best edges between models, or by inferring promising relationships between variables. We compare these methods to totally random tree generation and randomization based on bootstrap-resampling (bagging), of respectively linear and quadratic complexity. Our results show that randomization becomes increasingly more interesting for smaller N/n ratios, and that methods based on simultaneously discovering and exploiting the problem structure are promising in this context.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards sub-quadratic learning of probability density models in the form of mixtures of trees

We consider randomization schemes of the Chow-Liu algorithm from weak (bagging, of quadratic complexity) to strong ones (full random sampling, of linear complexity), for learning probability density models in the form of mixtures of Markov trees. Our empirical study on high-dimensional synthetic problems shows that, while bagging is the most accurate scheme on average, some of the stronger rand...

متن کامل

Probability Density Estimation by Perturbing and Combining Tree Structured Markov Networks

To explore the Perturb and Combine idea for estimating probability densities, we study mixtures of tree structured Markov networks derived by bagging combined with the Chow and Liu maximum weight spanning tree algorithm, or by pure random sampling. We empirically assess the performances of these methods in terms of accuracy, with respect to mixture models derived by EM-based learning of Naive B...

متن کامل

Mixtures of Bagged Markov Tree Ensembles

Key points: •Trees → efficient algorithms. •Mixture → improved modeling. There are 2 approaches to improve over a single Chow-Liu tree: Bias reduction, e.g. EM algorithm [1] •Learning the mixture is viewed as a global optimization problem aiming at maximizing the data likelihood. •There is a bias-variance trade-off associated with the number of terms. • It leads to a partition of the learning s...

متن کامل

Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector Time Series

We consider the problem of modeling discrete-valued vector time series data using extensions of Chow-Liu tree models to capture both dependencies across time and dependencies across variables. We introduce conditional Chow-Liu tree models, an extension to standard Chow-Liu trees, for modeling conditional rather than joint densities. We describe learning algorithms for such models and show how t...

متن کامل

A Generalization of the Chow-Liu Algorithm and its Application to Statistical Learning http://arxiv.org/abs/1002.2240

Learning statistical knowledge from data takes large computation. We eventually compromise between the accuracy and the time complexity of the learning algorithms by choosing its approximation to the best solution. In this paper, we address how to efficiently estimate the dependency relation among attributes values by constructing an undirected graph (a Markov network) via the ChowLiu algorithm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010